Overview

Dataset statistics

Number of variables22
Number of observations84548
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.2 MiB
Average record size in memory176.0 B

Variable types

Numeric8
Categorical14

Alerts

EASE-MENT has constant value " "Constant
NEIGHBORHOOD has a high cardinality: 254 distinct valuesHigh cardinality
BUILDING CLASS AT PRESENT has a high cardinality: 167 distinct valuesHigh cardinality
ADDRESS has a high cardinality: 67563 distinct valuesHigh cardinality
APARTMENT NUMBER has a high cardinality: 3989 distinct valuesHigh cardinality
LAND SQUARE FEET has a high cardinality: 6062 distinct valuesHigh cardinality
GROSS SQUARE FEET has a high cardinality: 5691 distinct valuesHigh cardinality
BUILDING CLASS AT TIME OF SALE has a high cardinality: 166 distinct valuesHigh cardinality
SALE PRICE has a high cardinality: 10008 distinct valuesHigh cardinality
SALE DATE has a high cardinality: 364 distinct valuesHigh cardinality
BLOCK is highly overall correlated with Unnamed: 0 and 1 other fieldsHigh correlation
ZIP CODE is highly overall correlated with BOROUGH and 2 other fieldsHigh correlation
RESIDENTIAL UNITS is highly overall correlated with TOTAL UNITSHigh correlation
TOTAL UNITS is highly overall correlated with RESIDENTIAL UNITS and 1 other fieldsHigh correlation
BOROUGH is highly overall correlated with Unnamed: 0 and 4 other fieldsHigh correlation
BUILDING CLASS CATEGORY is highly overall correlated with BOROUGH and 5 other fieldsHigh correlation
TAX CLASS AT PRESENT is highly overall correlated with BOROUGH and 3 other fieldsHigh correlation
TAX CLASS AT TIME OF SALE is highly overall correlated with BUILDING CLASS CATEGORY and 1 other fieldsHigh correlation
Unnamed: 0 is highly overall correlated with BOROUGH and 1 other fieldsHigh correlation
LOT is highly overall correlated with BUILDING CLASS CATEGORYHigh correlation
COMMERCIAL UNITS is highly overall correlated with TOTAL UNITSHigh correlation
YEAR BUILT is highly overall correlated with BUILDING CLASS CATEGORYHigh correlation
RESIDENTIAL UNITS is highly skewed (γ1 = 60.70273283)Skewed
COMMERCIAL UNITS is highly skewed (γ1 = 214.4011234)Skewed
TOTAL UNITS is highly skewed (γ1 = 63.44833684)Skewed
ZIP CODE has 982 (1.2%) zerosZeros
RESIDENTIAL UNITS has 24783 (29.3%) zerosZeros
COMMERCIAL UNITS has 79429 (93.9%) zerosZeros
TOTAL UNITS has 19762 (23.4%) zerosZeros
YEAR BUILT has 6970 (8.2%) zerosZeros

Reproduction

Analysis started2022-12-11 06:35:24.118623
Analysis finished2022-12-11 06:35:39.544157
Duration15.43 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ)

Distinct26736
Distinct (%)31.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10344.36
Minimum4
Maximum26739
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:39.612069image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile849
Q14231
median8942
Q315987.25
95-th percentile23281
Maximum26739
Range26735
Interquartile range (IQR)11756.25

Descriptive statistics

Standard deviation7151.7794
Coefficient of variation (CV)0.69136994
Kurtosis-0.92822006
Mean10344.36
Median Absolute Deviation (MAD)5586.5
Skewness0.44078076
Sum8.7459494 × 108
Variance51147949
MonotonicityNot monotonic
2022-12-11T00:35:39.750406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4 5
 
< 0.1%
4699 5
 
< 0.1%
4710 5
 
< 0.1%
4709 5
 
< 0.1%
4708 5
 
< 0.1%
4707 5
 
< 0.1%
4706 5
 
< 0.1%
4705 5
 
< 0.1%
4704 5
 
< 0.1%
4703 5
 
< 0.1%
Other values (26726) 84498
99.9%
ValueCountFrequency (%)
4 5
< 0.1%
5 5
< 0.1%
6 5
< 0.1%
7 5
< 0.1%
8 5
< 0.1%
9 5
< 0.1%
10 5
< 0.1%
11 5
< 0.1%
12 5
< 0.1%
13 5
< 0.1%
ValueCountFrequency (%)
26739 1
< 0.1%
26738 1
< 0.1%
26737 1
< 0.1%
26736 1
< 0.1%
26735 1
< 0.1%
26734 1
< 0.1%
26733 1
< 0.1%
26732 1
< 0.1%
26731 1
< 0.1%
26730 1
< 0.1%

BOROUGH
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
4
26736 
3
24047 
1
18306 
5
8410 
2
7049 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters84548
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

Length

2022-12-11T00:35:39.867406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-11T00:35:39.989406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

Most occurring characters

ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 84548
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Common 84548
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84548
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 26736
31.6%
3 24047
28.4%
1 18306
21.7%
5 8410
 
9.9%
2 7049
 
8.3%

NEIGHBORHOOD
Categorical

Distinct254
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
FLUSHING-NORTH
 
3068
UPPER EAST SIDE (59-79)
 
1736
UPPER EAST SIDE (79-96)
 
1590
UPPER WEST SIDE (59-79)
 
1439
BEDFORD STUYVESANT
 
1436
Other values (249)
75279 

Length

Max length25
Median length20
Mean length13.144983
Min length4

Characters and Unicode

Total characters1111382
Distinct characters38
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowALPHABET CITY
2nd rowALPHABET CITY
3rd rowALPHABET CITY
4th rowALPHABET CITY
5th rowALPHABET CITY

Common Values

ValueCountFrequency (%)
FLUSHING-NORTH 3068
 
3.6%
UPPER EAST SIDE (59-79) 1736
 
2.1%
UPPER EAST SIDE (79-96) 1590
 
1.9%
UPPER WEST SIDE (59-79) 1439
 
1.7%
BEDFORD STUYVESANT 1436
 
1.7%
MIDTOWN EAST 1418
 
1.7%
BOROUGH PARK 1245
 
1.5%
ASTORIA 1216
 
1.4%
BAYSIDE 1150
 
1.4%
FOREST HILLS 1069
 
1.3%
Other values (244) 69181
81.8%

Length

2022-12-11T00:35:40.106829image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
east 6664
 
4.4%
side 6484
 
4.3%
upper 6471
 
4.3%
park 6273
 
4.1%
heights 4268
 
2.8%
west 4034
 
2.7%
59-79 3175
 
2.1%
flushing-north 3068
 
2.0%
hill 2695
 
1.8%
bay 2646
 
1.7%
Other values (285) 106120
69.9%

Most occurring characters

ValueCountFrequency (%)
E 104344
 
9.4%
A 80142
 
7.2%
S 79590
 
7.2%
R 72558
 
6.5%
67350
 
6.1%
I 64910
 
5.8%
O 62467
 
5.6%
T 62184
 
5.6%
L 61705
 
5.6%
N 60529
 
5.4%
Other values (28) 395603
35.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 983670
88.5%
Space Separator 67350
 
6.1%
Decimal Number 25141
 
2.3%
Dash Punctuation 19371
 
1.7%
Close Punctuation 6182
 
0.6%
Open Punctuation 6182
 
0.6%
Other Punctuation 3486
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 104344
 
10.6%
A 80142
 
8.1%
S 79590
 
8.1%
R 72558
 
7.4%
I 64910
 
6.6%
O 62467
 
6.4%
T 62184
 
6.3%
L 61705
 
6.3%
N 60529
 
6.2%
H 51508
 
5.2%
Other values (16) 283733
28.8%
Decimal Number
ValueCountFrequency (%)
9 11951
47.5%
7 5769
22.9%
6 3365
 
13.4%
5 3175
 
12.6%
1 826
 
3.3%
0 55
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 2230
64.0%
. 1256
36.0%
Space Separator
ValueCountFrequency (%)
67350
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19371
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6182
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 983670
88.5%
Common 127712
 
11.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 104344
 
10.6%
A 80142
 
8.1%
S 79590
 
8.1%
R 72558
 
7.4%
I 64910
 
6.6%
O 62467
 
6.4%
T 62184
 
6.3%
L 61705
 
6.3%
N 60529
 
6.2%
H 51508
 
5.2%
Other values (16) 283733
28.8%
Common
ValueCountFrequency (%)
67350
52.7%
- 19371
 
15.2%
9 11951
 
9.4%
) 6182
 
4.8%
( 6182
 
4.8%
7 5769
 
4.5%
6 3365
 
2.6%
5 3175
 
2.5%
/ 2230
 
1.7%
. 1256
 
1.0%
Other values (2) 881
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1111382
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E 104344
 
9.4%
A 80142
 
7.2%
S 79590
 
7.2%
R 72558
 
6.5%
67350
 
6.1%
I 64910
 
5.8%
O 62467
 
5.6%
T 62184
 
5.6%
L 61705
 
5.6%
N 60529
 
5.4%
Other values (28) 395603
35.6%
Distinct47
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
01 ONE FAMILY DWELLINGS
18235 
02 TWO FAMILY DWELLINGS
15828 
13 CONDOS - ELEVATOR APARTMENTS
12989 
10 COOPS - ELEVATOR APARTMENTS
12902 
03 THREE FAMILY DWELLINGS
4384 
Other values (42)
20210 

Length

Max length44
Median length43
Mean length43.000509
Min length43

Characters and Unicode

Total characters3635607
Distinct characters36
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row07 RENTALS - WALKUP APARTMENTS
2nd row07 RENTALS - WALKUP APARTMENTS
3rd row07 RENTALS - WALKUP APARTMENTS
4th row07 RENTALS - WALKUP APARTMENTS
5th row07 RENTALS - WALKUP APARTMENTS

Common Values

ValueCountFrequency (%)
01 ONE FAMILY DWELLINGS 18235
21.6%
02 TWO FAMILY DWELLINGS 15828
18.7%
13 CONDOS - ELEVATOR APARTMENTS 12989
15.4%
10 COOPS - ELEVATOR APARTMENTS 12902
15.3%
03 THREE FAMILY DWELLINGS 4384
 
5.2%
07 RENTALS - WALKUP APARTMENTS 3466
 
4.1%
09 COOPS - WALKUP APARTMENTS 2767
 
3.3%
04 TAX CLASS 1 CONDOS 1656
 
2.0%
44 CONDO PARKING 1441
 
1.7%
15 CONDOS - 2-10 UNIT RESIDENTIAL 1281
 
1.5%
Other values (37) 9599
11.4%

Length

2022-12-11T00:35:40.215880image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
family 38447
 
10.3%
dwellings 38447
 
10.3%
35824
 
9.6%
apartments 33432
 
8.9%
elevator 26273
 
7.0%
01 18235
 
4.9%
one 18235
 
4.9%
condos 16978
 
4.5%
coops 16870
 
4.5%
two 15828
 
4.2%
Other values (102) 115289
30.8%

Most occurring characters

ValueCountFrequency (%)
1746496
48.0%
E 165470
 
4.6%
L 163874
 
4.5%
A 162055
 
4.5%
O 141686
 
3.9%
T 129584
 
3.6%
N 127409
 
3.5%
S 125167
 
3.4%
I 90952
 
2.5%
R 76041
 
2.1%
Other values (26) 706873
19.4%

Most occurring categories

ValueCountFrequency (%)
Space Separator 1746496
48.0%
Uppercase Letter 1672138
46.0%
Decimal Number 178488
 
4.9%
Dash Punctuation 38292
 
1.1%
Other Punctuation 193
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 165470
 
9.9%
L 163874
 
9.8%
A 162055
 
9.7%
O 141686
 
8.5%
T 129584
 
7.7%
N 127409
 
7.6%
S 125167
 
7.5%
I 90952
 
5.4%
R 76041
 
4.5%
M 74296
 
4.4%
Other values (13) 415604
24.9%
Decimal Number
ValueCountFrequency (%)
0 63426
35.5%
1 54500
30.5%
2 21413
 
12.0%
3 19069
 
10.7%
4 7517
 
4.2%
7 5345
 
3.0%
9 3386
 
1.9%
5 2784
 
1.6%
6 560
 
0.3%
8 488
 
0.3%
Space Separator
ValueCountFrequency (%)
1746496
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 38292
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 193
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1963469
54.0%
Latin 1672138
46.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 165470
 
9.9%
L 163874
 
9.8%
A 162055
 
9.7%
O 141686
 
8.5%
T 129584
 
7.7%
N 127409
 
7.6%
S 125167
 
7.5%
I 90952
 
5.4%
R 76041
 
4.5%
M 74296
 
4.4%
Other values (13) 415604
24.9%
Common
ValueCountFrequency (%)
1746496
88.9%
0 63426
 
3.2%
1 54500
 
2.8%
- 38292
 
2.0%
2 21413
 
1.1%
3 19069
 
1.0%
4 7517
 
0.4%
7 5345
 
0.3%
9 3386
 
0.2%
5 2784
 
0.1%
Other values (3) 1241
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3635607
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1746496
48.0%
E 165470
 
4.6%
L 163874
 
4.5%
A 162055
 
4.5%
O 141686
 
3.9%
T 129584
 
3.6%
N 127409
 
3.5%
S 125167
 
3.4%
I 90952
 
2.5%
R 76041
 
2.1%
Other values (26) 706873
19.4%
Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
1
38633 
2
30919 
4
6140 
2A
 
2521
2C
 
1915
Other values (6)
4420 

Length

Max length2
Median length1
Mean length1.0959692
Min length1

Characters and Unicode

Total characters92662
Distinct characters8
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2A
2nd row2
3rd row2
4th row2B
5th row2A

Common Values

ValueCountFrequency (%)
1 38633
45.7%
2 30919
36.6%
4 6140
 
7.3%
2A 2521
 
3.0%
2C 1915
 
2.3%
1A 1444
 
1.7%
1B 1234
 
1.5%
2B 814
 
1.0%
738
 
0.9%
1C 186
 
0.2%

Length

2022-12-11T00:35:40.327543image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 38633
46.1%
2 30919
36.9%
4 6140
 
7.3%
2a 2521
 
3.0%
2c 1915
 
2.3%
1a 1444
 
1.7%
1b 1234
 
1.5%
2b 814
 
1.0%
1c 186
 
0.2%
3 4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1 41497
44.8%
2 36169
39.0%
4 6140
 
6.6%
A 3965
 
4.3%
C 2101
 
2.3%
B 2048
 
2.2%
738
 
0.8%
3 4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 83810
90.4%
Uppercase Letter 8114
 
8.8%
Space Separator 738
 
0.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 41497
49.5%
2 36169
43.2%
4 6140
 
7.3%
3 4
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
A 3965
48.9%
C 2101
25.9%
B 2048
25.2%
Space Separator
ValueCountFrequency (%)
738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 84548
91.2%
Latin 8114
 
8.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1 41497
49.1%
2 36169
42.8%
4 6140
 
7.3%
738
 
0.9%
3 4
 
< 0.1%
Latin
ValueCountFrequency (%)
A 3965
48.9%
C 2101
25.9%
B 2048
25.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 92662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 41497
44.8%
2 36169
39.0%
4 6140
 
6.6%
A 3965
 
4.3%
C 2101
 
2.3%
B 2048
 
2.2%
738
 
0.8%
3 4
 
< 0.1%

BLOCK
Real number (ℝ)

Distinct11566
Distinct (%)13.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4237.219
Minimum1
Maximum16322
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:40.451344image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile276
Q11322.75
median3311
Q36281
95-th percentile11615.65
Maximum16322
Range16321
Interquartile range (IQR)4958.25

Descriptive statistics

Standard deviation3568.2634
Coefficient of variation (CV)0.84212391
Kurtosis0.59689403
Mean4237.219
Median Absolute Deviation (MAD)2212
Skewness1.049335
Sum3.5824839 × 108
Variance12732504
MonotonicityNot monotonic
2022-12-11T00:35:40.581346image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5066 404
 
0.5%
16 255
 
0.3%
2135 211
 
0.2%
4978 187
 
0.2%
1171 181
 
0.2%
8489 170
 
0.2%
1226 168
 
0.2%
3944 152
 
0.2%
31 135
 
0.2%
1129 135
 
0.2%
Other values (11556) 82550
97.6%
ValueCountFrequency (%)
1 26
 
< 0.1%
3 5
 
< 0.1%
5 1
 
< 0.1%
6 2
 
< 0.1%
7 2
 
< 0.1%
8 3
 
< 0.1%
10 10
 
< 0.1%
13 2
 
< 0.1%
15 25
 
< 0.1%
16 255
0.3%
ValueCountFrequency (%)
16322 1
 
< 0.1%
16319 1
 
< 0.1%
16317 3
< 0.1%
16316 2
< 0.1%
16315 2
< 0.1%
16313 2
< 0.1%
16310 2
< 0.1%
16305 4
< 0.1%
16304 3
< 0.1%
16300 2
< 0.1%

LOT
Real number (ℝ)

Distinct2627
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean376.22401
Minimum1
Maximum9106
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:40.726786image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q122
median50
Q31001
95-th percentile1403
Maximum9106
Range9105
Interquartile range (IQR)979

Descriptive statistics

Standard deviation658.13681
Coefficient of variation (CV)1.7493216
Kurtosis24.937658
Mean376.22401
Median Absolute Deviation (MAD)38
Skewness3.5006793
Sum31808988
Variance433144.07
MonotonicityNot monotonic
2022-12-11T00:35:40.854298image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 4125
 
4.9%
20 983
 
1.2%
12 972
 
1.1%
40 935
 
1.1%
23 911
 
1.1%
10 895
 
1.1%
15 894
 
1.1%
29 891
 
1.1%
25 879
 
1.0%
19 874
 
1.0%
Other values (2617) 72189
85.4%
ValueCountFrequency (%)
1 4125
4.9%
2 742
 
0.9%
3 811
 
1.0%
4 685
 
0.8%
5 805
 
1.0%
6 837
 
1.0%
7 830
 
1.0%
8 787
 
0.9%
9 783
 
0.9%
10 895
 
1.1%
ValueCountFrequency (%)
9106 1
< 0.1%
9099 1
< 0.1%
9085 1
< 0.1%
9081 1
< 0.1%
9080 1
< 0.1%
9056 1
< 0.1%
9053 2
< 0.1%
9050 1
< 0.1%
9049 1
< 0.1%
9040 1
< 0.1%

EASE-MENT
Categorical

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
84548 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters84548
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
84548
100.0%

Length

2022-12-11T00:35:40.971345image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-11T00:35:41.079312image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
84548
100.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 84548
100.0%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
84548
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 84548
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
84548
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84548
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
84548
100.0%
Distinct167
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
D4
12663 
R4
12482 
A1
6753 
A5
5683 
B2
4923 
Other values (162)
42044 

Length

Max length2
Median length2
Mean length1.9912712
Min length1

Characters and Unicode

Total characters168358
Distinct characters36
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowC2
2nd rowC7
3rd rowC7
4th rowC4
5th rowC2

Common Values

ValueCountFrequency (%)
D4 12663
15.0%
R4 12482
14.8%
A1 6753
 
8.0%
A5 5683
 
6.7%
B2 4923
 
5.8%
B1 4749
 
5.6%
C0 4379
 
5.2%
B3 3824
 
4.5%
A2 2821
 
3.3%
C6 2760
 
3.3%
Other values (157) 23511
27.8%

Length

2022-12-11T00:35:41.169694image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
d4 12663
15.1%
r4 12482
14.9%
a1 6753
 
8.1%
a5 5683
 
6.8%
b2 4923
 
5.9%
b1 4749
 
5.7%
c0 4379
 
5.2%
b3 3824
 
4.6%
a2 2821
 
3.4%
c6 2760
 
3.3%
Other values (156) 22773
27.2%

Most occurring characters

ValueCountFrequency (%)
4 26151
15.5%
R 20291
12.1%
A 17872
10.6%
B 15514
9.2%
1 15395
9.1%
D 13289
7.9%
2 10741
6.4%
C 10610
6.3%
3 7128
 
4.2%
0 6512
 
3.9%
Other values (26) 24855
14.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 86498
51.4%
Decimal Number 81122
48.2%
Space Separator 738
 
0.4%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 20291
23.5%
A 17872
20.7%
B 15514
17.9%
D 13289
15.4%
C 10610
12.3%
S 2194
 
2.5%
G 1805
 
2.1%
V 1700
 
2.0%
K 1092
 
1.3%
O 348
 
0.4%
Other values (15) 1783
 
2.1%
Decimal Number
ValueCountFrequency (%)
4 26151
32.2%
1 15395
19.0%
2 10741
13.2%
3 7128
 
8.8%
0 6512
 
8.0%
5 6242
 
7.7%
9 4812
 
5.9%
6 3182
 
3.9%
7 769
 
0.9%
8 190
 
0.2%
Space Separator
ValueCountFrequency (%)
738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 86498
51.4%
Common 81860
48.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 20291
23.5%
A 17872
20.7%
B 15514
17.9%
D 13289
15.4%
C 10610
12.3%
S 2194
 
2.5%
G 1805
 
2.1%
V 1700
 
2.0%
K 1092
 
1.3%
O 348
 
0.4%
Other values (15) 1783
 
2.1%
Common
ValueCountFrequency (%)
4 26151
31.9%
1 15395
18.8%
2 10741
13.1%
3 7128
 
8.7%
0 6512
 
8.0%
5 6242
 
7.6%
9 4812
 
5.9%
6 3182
 
3.9%
7 769
 
0.9%
738
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 168358
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 26151
15.5%
R 20291
12.1%
A 17872
10.6%
B 15514
9.2%
1 15395
9.1%
D 13289
7.9%
2 10741
6.4%
C 10610
6.3%
3 7128
 
4.2%
0 6512
 
3.9%
Other values (26) 24855
14.8%

ADDRESS
Categorical

Distinct67563
Distinct (%)79.9%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
131-05 40TH ROAD
 
210
429 KENT AVENUE
 
158
169 WEST 95TH STREET
 
153
131-03 40TH ROAD
 
147
265 STATE STREET
 
127
Other values (67558)
83753 

Length

Max length34
Median length30
Mean length19.262644
Min length5

Characters and Unicode

Total characters1628618
Distinct characters45
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique62078 ?
Unique (%)73.4%

Sample

1st row153 AVENUE B
2nd row234 EAST 4TH STREET
3rd row197 EAST 3RD STREET
4th row154 EAST 7TH STREET
5th row301 EAST 10TH STREET

Common Values

ValueCountFrequency (%)
131-05 40TH ROAD 210
 
0.2%
429 KENT AVENUE 158
 
0.2%
169 WEST 95TH STREET 153
 
0.2%
131-03 40TH ROAD 147
 
0.2%
265 STATE STREET 127
 
0.2%
550 VANDERBILT AVENUE 126
 
0.1%
50 WEST STREET 115
 
0.1%
39TH AVENUE 108
 
0.1%
30 PARK PLACE 107
 
0.1%
1809 EMMONS AVENUE 103
 
0.1%
Other values (67553) 83194
98.4%

Length

2022-12-11T00:35:41.298992image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
street 39956
 
13.7%
avenue 24787
 
8.5%
east 9802
 
3.4%
west 6651
 
2.3%
road 3501
 
1.2%
place 3055
 
1.1%
ave 1656
 
0.6%
park 1435
 
0.5%
boulevard 1411
 
0.5%
st 1410
 
0.5%
Other values (21800) 197250
67.8%

Most occurring characters

ValueCountFrequency (%)
236125
14.5%
E 191590
 
11.8%
T 146406
 
9.0%
R 81143
 
5.0%
1 79718
 
4.9%
A 79687
 
4.9%
S 77236
 
4.7%
N 58986
 
3.6%
2 52644
 
3.2%
3 42496
 
2.6%
Other values (35) 582587
35.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 948427
58.2%
Decimal Number 400922
24.6%
Space Separator 236125
 
14.5%
Dash Punctuation 25141
 
1.5%
Other Punctuation 18003
 
1.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E 191590
20.2%
T 146406
15.4%
R 81143
8.6%
A 79687
8.4%
S 77236
8.1%
N 58986
 
6.2%
H 37130
 
3.9%
U 34792
 
3.7%
V 34168
 
3.6%
O 31485
 
3.3%
Other values (16) 175804
18.5%
Decimal Number
ValueCountFrequency (%)
1 79718
19.9%
2 52644
13.1%
3 42496
10.6%
5 40828
10.2%
0 39582
9.9%
4 37706
9.4%
6 30373
 
7.6%
7 27885
 
7.0%
8 26154
 
6.5%
9 23536
 
5.9%
Other Punctuation
ValueCountFrequency (%)
, 16730
92.9%
/ 743
 
4.1%
. 457
 
2.5%
# 37
 
0.2%
' 23
 
0.1%
& 7
 
< 0.1%
* 6
 
< 0.1%
Space Separator
ValueCountFrequency (%)
236125
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25141
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 948427
58.2%
Common 680191
41.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E 191590
20.2%
T 146406
15.4%
R 81143
8.6%
A 79687
8.4%
S 77236
8.1%
N 58986
 
6.2%
H 37130
 
3.9%
U 34792
 
3.7%
V 34168
 
3.6%
O 31485
 
3.3%
Other values (16) 175804
18.5%
Common
ValueCountFrequency (%)
236125
34.7%
1 79718
 
11.7%
2 52644
 
7.7%
3 42496
 
6.2%
5 40828
 
6.0%
0 39582
 
5.8%
4 37706
 
5.5%
6 30373
 
4.5%
7 27885
 
4.1%
8 26154
 
3.8%
Other values (9) 66680
 
9.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1628618
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
236125
14.5%
E 191590
 
11.8%
T 146406
 
9.0%
R 81143
 
5.0%
1 79718
 
4.9%
A 79687
 
4.9%
S 77236
 
4.7%
N 58986
 
3.6%
2 52644
 
3.2%
3 42496
 
2.6%
Other values (35) 582587
35.8%

APARTMENT NUMBER
Categorical

Distinct3989
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
65496 
4
 
298
3A
 
295
2
 
275
3B
 
275
Other values (3984)
17909 

Length

Max length11
Median length1
Mean length1.3446563
Min length1

Characters and Unicode

Total characters113688
Distinct characters48
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2458 ?
Unique (%)2.9%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
65496
77.5%
4 298
 
0.4%
3A 295
 
0.3%
2 275
 
0.3%
3B 275
 
0.3%
2B 272
 
0.3%
3 263
 
0.3%
2A 263
 
0.3%
1 242
 
0.3%
4B 228
 
0.3%
Other values (3979) 16641
 
19.7%

Length

2022-12-11T00:35:41.421844image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
4 309
 
1.6%
3a 295
 
1.5%
2 285
 
1.5%
3b 275
 
1.4%
2b 274
 
1.4%
3 270
 
1.4%
2a 264
 
1.4%
1 248
 
1.3%
4b 228
 
1.2%
4a 206
 
1.1%
Other values (3810) 16605
86.2%

Most occurring characters

ValueCountFrequency (%)
65703
57.8%
1 6322
 
5.6%
2 4521
 
4.0%
3 3568
 
3.1%
4 3105
 
2.7%
A 2640
 
2.3%
5 2547
 
2.2%
B 2430
 
2.1%
0 2389
 
2.1%
6 2131
 
1.9%
Other values (38) 18332
 
16.1%

Most occurring categories

ValueCountFrequency (%)
Space Separator 65703
57.8%
Decimal Number 28928
25.4%
Uppercase Letter 18172
 
16.0%
Dash Punctuation 808
 
0.7%
Other Punctuation 65
 
0.1%
Math Symbol 4
 
< 0.1%
Modifier Symbol 2
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Lowercase Letter 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 2640
14.5%
B 2430
13.4%
C 1947
10.7%
P 1732
9.5%
D 1379
7.6%
E 1144
 
6.3%
H 1025
 
5.6%
F 872
 
4.8%
S 839
 
4.6%
G 758
 
4.2%
Other values (16) 3406
18.7%
Decimal Number
ValueCountFrequency (%)
1 6322
21.9%
2 4521
15.6%
3 3568
12.3%
4 3105
10.7%
5 2547
8.8%
0 2389
 
8.3%
6 2131
 
7.4%
7 1635
 
5.7%
8 1478
 
5.1%
9 1232
 
4.3%
Other Punctuation
ValueCountFrequency (%)
/ 43
66.2%
. 13
 
20.0%
& 7
 
10.8%
# 2
 
3.1%
Lowercase Letter
ValueCountFrequency (%)
b 1
50.0%
c 1
50.0%
Space Separator
ValueCountFrequency (%)
65703
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 808
100.0%
Math Symbol
ValueCountFrequency (%)
+ 4
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 95514
84.0%
Latin 18174
 
16.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 2640
14.5%
B 2430
13.4%
C 1947
10.7%
P 1732
9.5%
D 1379
7.6%
E 1144
 
6.3%
H 1025
 
5.6%
F 872
 
4.8%
S 839
 
4.6%
G 758
 
4.2%
Other values (18) 3408
18.8%
Common
ValueCountFrequency (%)
65703
68.8%
1 6322
 
6.6%
2 4521
 
4.7%
3 3568
 
3.7%
4 3105
 
3.3%
5 2547
 
2.7%
0 2389
 
2.5%
6 2131
 
2.2%
7 1635
 
1.7%
8 1478
 
1.5%
Other values (10) 2115
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 113688
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
65703
57.8%
1 6322
 
5.6%
2 4521
 
4.0%
3 3568
 
3.1%
4 3105
 
2.7%
A 2640
 
2.3%
5 2547
 
2.2%
B 2430
 
2.1%
0 2389
 
2.1%
6 2131
 
1.9%
Other values (38) 18332
 
16.1%

ZIP CODE
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct186
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10731.992
Minimum0
Maximum11694
Zeros982
Zeros (%)1.2%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:41.746961image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10011
Q110305
median11209
Q311357
95-th percentile11427
Maximum11694
Range11694
Interquartile range (IQR)1052

Descriptive statistics

Standard deviation1290.8791
Coefficient of variation (CV)0.12028328
Kurtosis52.539297
Mean10731.992
Median Absolute Deviation (MAD)206
Skewness-6.6563208
Sum9.0736843 × 108
Variance1666369
MonotonicityNot monotonic
2022-12-11T00:35:41.879996image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10314 1687
 
2.0%
11354 1384
 
1.6%
11201 1324
 
1.6%
11235 1312
 
1.6%
11234 1165
 
1.4%
11375 1144
 
1.4%
10312 1088
 
1.3%
10306 1061
 
1.3%
10023 1053
 
1.2%
10011 1048
 
1.2%
Other values (176) 72282
85.5%
ValueCountFrequency (%)
0 982
1.2%
10001 204
 
0.2%
10002 328
 
0.4%
10003 812
1.0%
10004 95
 
0.1%
10005 199
 
0.2%
10006 184
 
0.2%
10007 313
 
0.4%
10009 244
 
0.3%
10010 459
0.5%
ValueCountFrequency (%)
11694 273
 
0.3%
11693 142
 
0.2%
11692 157
 
0.2%
11691 435
0.5%
11436 312
0.4%
11435 525
0.6%
11434 705
0.8%
11433 442
0.5%
11432 533
0.6%
11430 1
 
< 0.1%

RESIDENTIAL UNITS
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct176
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0252638
Minimum0
Maximum1844
Zeros24783
Zeros (%)29.3%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:42.017005image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile4
Maximum1844
Range1844
Interquartile range (IQR)2

Descriptive statistics

Standard deviation16.721037
Coefficient of variation (CV)8.2562269
Kurtosis5299.9341
Mean2.0252638
Median Absolute Deviation (MAD)1
Skewness60.702733
Sum171232
Variance279.59308
MonotonicityNot monotonic
2022-12-11T00:35:42.147931image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 34722
41.1%
0 24783
29.3%
2 16049
19.0%
3 4608
 
5.5%
4 1346
 
1.6%
6 787
 
0.9%
8 332
 
0.4%
5 273
 
0.3%
10 145
 
0.2%
16 122
 
0.1%
Other values (166) 1381
 
1.6%
ValueCountFrequency (%)
0 24783
29.3%
1 34722
41.1%
2 16049
19.0%
3 4608
 
5.5%
4 1346
 
1.6%
5 273
 
0.3%
6 787
 
0.9%
7 121
 
0.1%
8 332
 
0.4%
9 113
 
0.1%
ValueCountFrequency (%)
1844 2
< 0.1%
1641 1
 
< 0.1%
948 1
 
< 0.1%
894 1
 
< 0.1%
889 1
 
< 0.1%
771 3
< 0.1%
716 2
< 0.1%
680 2
< 0.1%
550 1
 
< 0.1%
529 1
 
< 0.1%

COMMERCIAL UNITS
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.19355869
Minimum0
Maximum2261
Zeros79429
Zeros (%)93.9%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:42.297843image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum2261
Range2261
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.7131834
Coefficient of variation (CV)45.015718
Kurtosis53950.593
Mean0.19355869
Median Absolute Deviation (MAD)0
Skewness214.40112
Sum16365
Variance75.919564
MonotonicityNot monotonic
2022-12-11T00:35:42.427708image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 79429
93.9%
1 3558
 
4.2%
2 817
 
1.0%
3 259
 
0.3%
4 137
 
0.2%
5 74
 
0.1%
6 70
 
0.1%
7 31
 
< 0.1%
8 26
 
< 0.1%
9 20
 
< 0.1%
Other values (45) 127
 
0.2%
ValueCountFrequency (%)
0 79429
93.9%
1 3558
 
4.2%
2 817
 
1.0%
3 259
 
0.3%
4 137
 
0.2%
5 74
 
0.1%
6 70
 
0.1%
7 31
 
< 0.1%
8 26
 
< 0.1%
9 20
 
< 0.1%
ValueCountFrequency (%)
2261 1
 
< 0.1%
436 2
< 0.1%
422 2
< 0.1%
318 1
 
< 0.1%
254 4
< 0.1%
184 1
 
< 0.1%
172 1
 
< 0.1%
147 1
 
< 0.1%
126 2
< 0.1%
91 1
 
< 0.1%

TOTAL UNITS
Real number (ℝ)

HIGH CORRELATION
SKEWED
ZEROS

Distinct192
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.2491839
Minimum0
Maximum2261
Zeros19762
Zeros (%)23.4%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:42.564517image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile4
Maximum2261
Range2261
Interquartile range (IQR)1

Descriptive statistics

Standard deviation18.972584
Coefficient of variation (CV)8.4353193
Kurtosis5719.5837
Mean2.2491839
Median Absolute Deviation (MAD)1
Skewness63.448337
Sum190164
Variance359.95896
MonotonicityNot monotonic
2022-12-11T00:35:42.691708image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 38356
45.4%
0 19762
23.4%
2 15914
18.8%
3 5412
 
6.4%
4 1498
 
1.8%
6 870
 
1.0%
5 423
 
0.5%
8 374
 
0.4%
10 198
 
0.2%
7 197
 
0.2%
Other values (182) 1544
 
1.8%
ValueCountFrequency (%)
0 19762
23.4%
1 38356
45.4%
2 15914
18.8%
3 5412
 
6.4%
4 1498
 
1.8%
5 423
 
0.5%
6 870
 
1.0%
7 197
 
0.2%
8 374
 
0.4%
9 142
 
0.2%
ValueCountFrequency (%)
2261 1
 
< 0.1%
1866 2
< 0.1%
1653 1
 
< 0.1%
955 1
 
< 0.1%
902 1
 
< 0.1%
889 1
 
< 0.1%
771 3
< 0.1%
736 2
< 0.1%
680 2
< 0.1%
551 1
 
< 0.1%

LAND SQUARE FEET
Categorical

Distinct6062
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
-
26252 
0
10326 
2000
3919 
2500
 
3470
4000
 
3044
Other values (6057)
37537 

Length

Max length7
Median length4
Mean length3.6486729
Min length1

Characters and Unicode

Total characters308488
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2675 ?
Unique (%)3.2%

Sample

1st row1633
2nd row4616
3rd row2212
4th row2272
5th row2369

Common Values

ValueCountFrequency (%)
- 26252
31.0%
0 10326
 
12.2%
2000 3919
 
4.6%
2500 3470
 
4.1%
4000 3044
 
3.6%
1800 1192
 
1.4%
3000 1190
 
1.4%
5000 1009
 
1.2%
2200 512
 
0.6%
2400 486
 
0.6%
Other values (6052) 33148
39.2%

Length

2022-12-11T00:35:42.818710image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
26252
31.0%
0 10326
 
12.2%
2000 3919
 
4.6%
2500 3470
 
4.1%
4000 3044
 
3.6%
1800 1192
 
1.4%
3000 1190
 
1.4%
5000 1009
 
1.2%
2200 512
 
0.6%
2400 486
 
0.6%
Other values (6052) 33148
39.2%

Most occurring characters

ValueCountFrequency (%)
78756
25.5%
0 78009
25.3%
2 28715
 
9.3%
- 26252
 
8.5%
5 17835
 
5.8%
1 17008
 
5.5%
3 13832
 
4.5%
4 13603
 
4.4%
8 9653
 
3.1%
6 9217
 
3.0%
Other values (2) 15608
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 203480
66.0%
Space Separator 78756
 
25.5%
Dash Punctuation 26252
 
8.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 78009
38.3%
2 28715
 
14.1%
5 17835
 
8.8%
1 17008
 
8.4%
3 13832
 
6.8%
4 13603
 
6.7%
8 9653
 
4.7%
6 9217
 
4.5%
7 8986
 
4.4%
9 6622
 
3.3%
Space Separator
ValueCountFrequency (%)
78756
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26252
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 308488
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
78756
25.5%
0 78009
25.3%
2 28715
 
9.3%
- 26252
 
8.5%
5 17835
 
5.8%
1 17008
 
5.5%
3 13832
 
4.5%
4 13603
 
4.4%
8 9653
 
3.1%
6 9217
 
3.0%
Other values (2) 15608
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 308488
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
78756
25.5%
0 78009
25.3%
2 28715
 
9.3%
- 26252
 
8.5%
5 17835
 
5.8%
1 17008
 
5.5%
3 13832
 
4.5%
4 13603
 
4.4%
8 9653
 
3.1%
6 9217
 
3.0%
Other values (2) 15608
 
5.1%
Distinct5691
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
-
27612 
0
11417 
2400
 
386
1800
 
361
2000
 
359
Other values (5686)
44413 

Length

Max length7
Median length4
Mean length3.5957799
Min length1

Characters and Unicode

Total characters304016
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2365 ?
Unique (%)2.8%

Sample

1st row6440
2nd row18690
3rd row7803
4th row6794
5th row4615

Common Values

ValueCountFrequency (%)
- 27612
32.7%
0 11417
 
13.5%
2400 386
 
0.5%
1800 361
 
0.4%
2000 359
 
0.4%
1600 346
 
0.4%
1440 340
 
0.4%
3000 324
 
0.4%
1200 295
 
0.3%
1280 281
 
0.3%
Other values (5681) 42827
50.7%

Length

2022-12-11T00:35:42.937676image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
27612
32.7%
0 11417
 
13.5%
2400 386
 
0.5%
1800 361
 
0.4%
2000 359
 
0.4%
1600 346
 
0.4%
1440 340
 
0.4%
3000 324
 
0.4%
1200 295
 
0.3%
1280 281
 
0.3%
Other values (5681) 42827
50.7%

Most occurring characters

ValueCountFrequency (%)
82836
27.2%
0 45165
14.9%
1 30980
 
10.2%
2 28708
 
9.4%
- 27612
 
9.1%
4 15934
 
5.2%
3 14746
 
4.9%
6 14576
 
4.8%
8 14498
 
4.8%
5 12031
 
4.0%
Other values (2) 16930
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 193568
63.7%
Space Separator 82836
27.2%
Dash Punctuation 27612
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 45165
23.3%
1 30980
16.0%
2 28708
14.8%
4 15934
 
8.2%
3 14746
 
7.6%
6 14576
 
7.5%
8 14498
 
7.5%
5 12031
 
6.2%
9 8593
 
4.4%
7 8337
 
4.3%
Space Separator
ValueCountFrequency (%)
82836
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27612
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 304016
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
82836
27.2%
0 45165
14.9%
1 30980
 
10.2%
2 28708
 
9.4%
- 27612
 
9.1%
4 15934
 
5.2%
3 14746
 
4.9%
6 14576
 
4.8%
8 14498
 
4.8%
5 12031
 
4.0%
Other values (2) 16930
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 304016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
82836
27.2%
0 45165
14.9%
1 30980
 
10.2%
2 28708
 
9.4%
- 27612
 
9.1%
4 15934
 
5.2%
3 14746
 
4.9%
6 14576
 
4.8%
8 14498
 
4.8%
5 12031
 
4.0%
Other values (2) 16930
 
5.6%

YEAR BUILT
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct158
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1789.323
Minimum0
Maximum2017
Zeros6970
Zeros (%)8.2%
Negative0
Negative (%)0.0%
Memory size660.7 KiB
2022-12-11T00:35:43.064098image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11920
median1940
Q31965
95-th percentile2013
Maximum2017
Range2017
Interquartile range (IQR)45

Descriptive statistics

Standard deviation537.34499
Coefficient of variation (CV)0.30030632
Kurtosis7.1463801
Mean1789.323
Median Absolute Deviation (MAD)23
Skewness-3.016062
Sum1.5128368 × 108
Variance288739.64
MonotonicityNot monotonic
2022-12-11T00:35:43.199112image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 6970
 
8.2%
1920 6045
 
7.1%
1930 5043
 
6.0%
1925 4312
 
5.1%
1910 3585
 
4.2%
1950 3156
 
3.7%
1960 2654
 
3.1%
1940 2456
 
2.9%
1931 2246
 
2.7%
1955 1961
 
2.3%
Other values (148) 46120
54.5%
ValueCountFrequency (%)
0 6970
8.2%
1111 1
 
< 0.1%
1680 1
 
< 0.1%
1800 37
 
< 0.1%
1826 1
 
< 0.1%
1829 1
 
< 0.1%
1832 1
 
< 0.1%
1835 2
 
< 0.1%
1840 2
 
< 0.1%
1844 2
 
< 0.1%
ValueCountFrequency (%)
2017 6
 
< 0.1%
2016 794
0.9%
2015 1470
1.7%
2014 1232
1.5%
2013 743
0.9%
2012 276
 
0.3%
2011 154
 
0.2%
2010 358
 
0.4%
2009 579
 
0.7%
2008 935
1.1%
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
1
41533 
2
36726 
4
6285 
3
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters84548
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%

Length

2022-12-11T00:35:43.331744image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-11T00:35:43.451154image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 84548
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 84548
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 84548
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 41533
49.1%
2 36726
43.4%
4 6285
 
7.4%
3 4
 
< 0.1%
Distinct166
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
R4
12989 
D4
12666 
A1
6751 
A5
5671 
B2
4918 
Other values (161)
41553 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters169096
Distinct characters35
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)< 0.1%

Sample

1st rowC2
2nd rowC7
3rd rowC7
4th rowC4
5th rowC2

Common Values

ValueCountFrequency (%)
R4 12989
15.4%
D4 12666
15.0%
A1 6751
 
8.0%
A5 5671
 
6.7%
B2 4918
 
5.8%
B1 4747
 
5.6%
C0 4384
 
5.2%
B3 3821
 
4.5%
A2 2867
 
3.4%
C6 2760
 
3.3%
Other values (156) 22974
27.2%

Length

2022-12-11T00:35:43.550925image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
r4 12989
15.4%
d4 12666
15.0%
a1 6751
 
8.0%
a5 5671
 
6.7%
b2 4918
 
5.8%
b1 4747
 
5.6%
c0 4384
 
5.2%
b3 3821
 
4.5%
a2 2867
 
3.4%
c6 2760
 
3.3%
Other values (156) 22974
27.2%

Most occurring characters

ValueCountFrequency (%)
4 26664
15.8%
R 21018
12.4%
A 17875
10.6%
B 15508
9.2%
1 15445
9.1%
D 13284
7.9%
2 10784
6.4%
C 10617
 
6.3%
3 7129
 
4.2%
0 6488
 
3.8%
Other values (25) 24284
14.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 87366
51.7%
Decimal Number 81730
48.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 21018
24.1%
A 17875
20.5%
B 15508
17.8%
D 13284
15.2%
C 10617
12.2%
S 2221
 
2.5%
G 1873
 
2.1%
V 1711
 
2.0%
K 1089
 
1.2%
P 353
 
0.4%
Other values (15) 1817
 
2.1%
Decimal Number
ValueCountFrequency (%)
4 26664
32.6%
1 15445
18.9%
2 10784
13.2%
3 7129
 
8.7%
0 6488
 
7.9%
5 6242
 
7.6%
9 4827
 
5.9%
6 3198
 
3.9%
7 767
 
0.9%
8 186
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 87366
51.7%
Common 81730
48.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 21018
24.1%
A 17875
20.5%
B 15508
17.8%
D 13284
15.2%
C 10617
12.2%
S 2221
 
2.5%
G 1873
 
2.1%
V 1711
 
2.0%
K 1089
 
1.2%
P 353
 
0.4%
Other values (15) 1817
 
2.1%
Common
ValueCountFrequency (%)
4 26664
32.6%
1 15445
18.9%
2 10784
13.2%
3 7129
 
8.7%
0 6488
 
7.9%
5 6242
 
7.6%
9 4827
 
5.9%
6 3198
 
3.9%
7 767
 
0.9%
8 186
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 169096
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 26664
15.8%
R 21018
12.4%
A 17875
10.6%
B 15508
9.2%
1 15445
9.1%
D 13284
7.9%
2 10784
6.4%
C 10617
 
6.3%
3 7129
 
4.2%
0 6488
 
3.8%
Other values (25) 24284
14.4%

SALE PRICE
Categorical

Distinct10008
Distinct (%)11.8%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
-
14561 
0
10228 
10
 
766
450000
 
427
550000
 
416
Other values (10003)
58150 

Length

Max length10
Median length9
Mean length5.1760302
Min length1

Characters and Unicode

Total characters437623
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6916 ?
Unique (%)8.2%

Sample

1st row6625000
2nd row -
3rd row -
4th row3936272
5th row8000000

Common Values

ValueCountFrequency (%)
- 14561
 
17.2%
0 10228
 
12.1%
10 766
 
0.9%
450000 427
 
0.5%
550000 416
 
0.5%
650000 414
 
0.5%
600000 409
 
0.5%
700000 382
 
0.5%
400000 378
 
0.4%
750000 377
 
0.4%
Other values (9998) 56190
66.5%

Length

2022-12-11T00:35:43.659961image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
14561
 
17.2%
0 10228
 
12.1%
10 766
 
0.9%
450000 427
 
0.5%
550000 416
 
0.5%
650000 414
 
0.5%
600000 409
 
0.5%
700000 382
 
0.5%
400000 378
 
0.4%
750000 377
 
0.4%
Other values (9998) 56190
66.5%

Most occurring characters

ValueCountFrequency (%)
0 201287
46.0%
43683
 
10.0%
5 35469
 
8.1%
1 24639
 
5.6%
2 20911
 
4.8%
3 17749
 
4.1%
4 16596
 
3.8%
7 16167
 
3.7%
9 16072
 
3.7%
6 15734
 
3.6%
Other values (2) 29316
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 379379
86.7%
Space Separator 43683
 
10.0%
Dash Punctuation 14561
 
3.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 201287
53.1%
5 35469
 
9.3%
1 24639
 
6.5%
2 20911
 
5.5%
3 17749
 
4.7%
4 16596
 
4.4%
7 16167
 
4.3%
9 16072
 
4.2%
6 15734
 
4.1%
8 14755
 
3.9%
Space Separator
ValueCountFrequency (%)
43683
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 14561
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 437623
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 201287
46.0%
43683
 
10.0%
5 35469
 
8.1%
1 24639
 
5.6%
2 20911
 
4.8%
3 17749
 
4.1%
4 16596
 
3.8%
7 16167
 
3.7%
9 16072
 
3.7%
6 15734
 
3.6%
Other values (2) 29316
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 437623
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 201287
46.0%
43683
 
10.0%
5 35469
 
8.1%
1 24639
 
5.6%
2 20911
 
4.8%
3 17749
 
4.1%
4 16596
 
3.8%
7 16167
 
3.7%
9 16072
 
3.7%
6 15734
 
3.6%
Other values (2) 29316
 
6.7%

SALE DATE
Categorical

Distinct364
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size660.7 KiB
2017-06-29 00:00:00
 
544
2017-06-15 00:00:00
 
530
2016-12-22 00:00:00
 
527
2017-05-25 00:00:00
 
511
2016-10-06 00:00:00
 
508
Other values (359)
81928 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters1606412
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row2017-07-19 00:00:00
2nd row2016-12-14 00:00:00
3rd row2016-12-09 00:00:00
4th row2016-09-23 00:00:00
5th row2016-11-17 00:00:00

Common Values

ValueCountFrequency (%)
2017-06-29 00:00:00 544
 
0.6%
2017-06-15 00:00:00 530
 
0.6%
2016-12-22 00:00:00 527
 
0.6%
2017-05-25 00:00:00 511
 
0.6%
2016-10-06 00:00:00 508
 
0.6%
2017-06-30 00:00:00 493
 
0.6%
2017-03-30 00:00:00 493
 
0.6%
2016-10-28 00:00:00 493
 
0.6%
2016-09-22 00:00:00 489
 
0.6%
2016-09-29 00:00:00 474
 
0.6%
Other values (354) 79486
94.0%

Length

2022-12-11T00:35:43.766962image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00 84548
50.0%
2017-06-29 544
 
0.3%
2017-06-15 530
 
0.3%
2016-12-22 527
 
0.3%
2017-05-25 511
 
0.3%
2016-10-06 508
 
0.3%
2017-06-30 493
 
0.3%
2017-03-30 493
 
0.3%
2016-10-28 493
 
0.3%
2016-09-22 489
 
0.3%
Other values (355) 79960
47.3%

Most occurring characters

ValueCountFrequency (%)
0 693670
43.2%
- 169096
 
10.5%
: 169096
 
10.5%
1 157522
 
9.8%
2 135506
 
8.4%
84548
 
5.3%
7 71062
 
4.4%
6 46100
 
2.9%
3 21154
 
1.3%
9 15627
 
1.0%
Other values (3) 43031
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1183672
73.7%
Dash Punctuation 169096
 
10.5%
Other Punctuation 169096
 
10.5%
Space Separator 84548
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 693670
58.6%
1 157522
 
13.3%
2 135506
 
11.4%
7 71062
 
6.0%
6 46100
 
3.9%
3 21154
 
1.8%
9 15627
 
1.3%
5 14766
 
1.2%
8 14473
 
1.2%
4 13792
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
- 169096
100.0%
Other Punctuation
ValueCountFrequency (%)
: 169096
100.0%
Space Separator
ValueCountFrequency (%)
84548
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1606412
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 693670
43.2%
- 169096
 
10.5%
: 169096
 
10.5%
1 157522
 
9.8%
2 135506
 
8.4%
84548
 
5.3%
7 71062
 
4.4%
6 46100
 
2.9%
3 21154
 
1.3%
9 15627
 
1.0%
Other values (3) 43031
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1606412
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 693670
43.2%
- 169096
 
10.5%
: 169096
 
10.5%
1 157522
 
9.8%
2 135506
 
8.4%
84548
 
5.3%
7 71062
 
4.4%
6 46100
 
2.9%
3 21154
 
1.3%
9 15627
 
1.0%
Other values (3) 43031
 
2.7%

Interactions

2022-12-11T00:35:37.591894image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:30.856837image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.798005image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.827841image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.790764image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.723660image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.827517image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.709115image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.710442image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:30.973933image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.918965image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.953840image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.906647image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.837646image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.934946image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.819087image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.839885image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.099944image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.046999image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.078837image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.030267image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.964041image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.056135image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.939462image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.952083image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.207907image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.163043image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.180406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.138248image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.072501image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.156929image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.042463image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:38.074086image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.332913image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.296346image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.293410image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.255626image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.365511image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.268632image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.154427image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:38.197085image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.456971image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.439346image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.434412image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.377221image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.482509image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.388645image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.271008image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:38.309119image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.566000image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.552345image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.541795image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.492221image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.593499image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.491157image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.374222image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:38.421812image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:31.672964image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:32.673840image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:33.666731image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:34.599181image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:35.702892image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:36.592118image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2022-12-11T00:35:37.474890image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2022-12-11T00:35:43.866644image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-11T00:35:44.050675image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-11T00:35:44.215797image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-11T00:35:44.421449image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-11T00:35:44.589483image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-11T00:35:44.719448image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-11T00:35:38.715168image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-11T00:35:39.197745image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Unnamed: 0BOROUGHNEIGHBORHOODBUILDING CLASS CATEGORYTAX CLASS AT PRESENTBLOCKLOTEASE-MENTBUILDING CLASS AT PRESENTADDRESSAPARTMENT NUMBERZIP CODERESIDENTIAL UNITSCOMMERCIAL UNITSTOTAL UNITSLAND SQUARE FEETGROSS SQUARE FEETYEAR BUILTTAX CLASS AT TIME OF SALEBUILDING CLASS AT TIME OF SALESALE PRICESALE DATE
041ALPHABET CITY07 RENTALS - WALKUP APARTMENTS2A3926C2153 AVENUE B100095051633644019002C266250002017-07-19 00:00:00
151ALPHABET CITY07 RENTALS - WALKUP APARTMENTS239926C7234 EAST 4TH STREET100092833146161869019002C7-2016-12-14 00:00:00
261ALPHABET CITY07 RENTALS - WALKUP APARTMENTS239939C7197 EAST 3RD STREET10009161172212780319002C7-2016-12-09 00:00:00
371ALPHABET CITY07 RENTALS - WALKUP APARTMENTS2B40221C4154 EAST 7TH STREET10009100102272679419132C439362722016-09-23 00:00:00
481ALPHABET CITY07 RENTALS - WALKUP APARTMENTS2A40455C2301 EAST 10TH STREET100096062369461519002C280000002016-11-17 00:00:00
591ALPHABET CITY07 RENTALS - WALKUP APARTMENTS240516C4516 EAST 12TH STREET10009200202581973019002C4-2017-07-20 00:00:00
6101ALPHABET CITY07 RENTALS - WALKUP APARTMENTS2B40632C4210 AVENUE B100098081750422619202C431928402016-09-23 00:00:00
7111ALPHABET CITY07 RENTALS - WALKUP APARTMENTS240718C7520 EAST 14TH STREET100094424651632100719002C7-2017-07-20 00:00:00
8121ALPHABET CITY08 RENTALS - ELEVATOR APARTMENTS237934D5141 AVENUE D10009150151534919819202D5-2017-06-20 00:00:00
9131ALPHABET CITY08 RENTALS - ELEVATOR APARTMENTS2387153D9629 EAST 5TH STREET100092402444891852319202D9162320002016-11-07 00:00:00
Unnamed: 0BOROUGHNEIGHBORHOODBUILDING CLASS CATEGORYTAX CLASS AT PRESENTBLOCKLOTEASE-MENTBUILDING CLASS AT PRESENTADDRESSAPARTMENT NUMBERZIP CODERESIDENTIAL UNITSCOMMERCIAL UNITSTOTAL UNITSLAND SQUARE FEETGROSS SQUARE FEETYEAR BUILTTAX CLASS AT TIME OF SALEBUILDING CLASS AT TIME OF SALESALE PRICESALE DATE
8453884045WOODROW02 TWO FAMILY DWELLINGS1731661B2178 DARNELL LANE103092023215130019951B2-2017-06-30 00:00:00
8453984055WOODROW02 TWO FAMILY DWELLINGS1731685B2137 DARNELL LANE103092023016130019951B2-2016-12-30 00:00:00
8454084065WOODROW02 TWO FAMILY DWELLINGS1731693B2125 DARNELL LANE103092023325130019951B25090002016-10-31 00:00:00
8454184075WOODROW02 TWO FAMILY DWELLINGS17317126B2112 ROBIN COURT1030920211088216019941B26480002016-12-07 00:00:00
8454284085WOODROW02 TWO FAMILY DWELLINGS1733941B941 SONIA COURT103092023020180019971B9-2016-12-01 00:00:00
8454384095WOODROW02 TWO FAMILY DWELLINGS1734934B937 QUAIL LANE103092022400257519981B94500002016-11-28 00:00:00
8454484105WOODROW02 TWO FAMILY DWELLINGS1734978B932 PHEASANT LANE103092022498237719981B95500002017-04-21 00:00:00
8454584115WOODROW02 TWO FAMILY DWELLINGS1735160B249 PITNEY AVENUE103092024000149619251B24600002017-07-05 00:00:00
8454684125WOODROW22 STORE BUILDINGS4710028K62730 ARTHUR KILL ROAD103090772080336411720014K6116933372016-12-21 00:00:00
8454784135WOODROW35 INDOOR PUBLIC AND CULTURAL FACILITIES47105679P9155 CLAY PIT ROAD1030901110796240020064P9693002016-10-27 00:00:00